An Expanded Taxonomy of Semiotic Classes for Text Normalization
نویسندگان
چکیده
We describe an expanded taxonomy of semiotic classes for text normalization, building upon the work in [1]. We add a large number of categories of non-standard words (NSWs) that we believe a robust real-world text normalization system will have to be able to process. Our new categories are based upon empirical findings encountered while building text normalization systems across many languages, for both speech recognition and speech synthesis purposes. We believe our new taxonomy is useful both for ensuring high coverage when writing manual grammars, as well as for eliciting training data to build machine learning-based text normalization systems.
منابع مشابه
Text Normalization System for Bangla
This paper describes a process of text normalization system for the Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes, a set of rules was written for tokenization and verbalization. This study is important for Text-ToSpeech (TTS) system and as well as for creating a language model used in speech recognition.
متن کاملNormalization of Non-Standard Words in Croatian Texts
This paper presents text normalization which is an integral part of any text-to-speech synthesis system. Text normalization is a set of methods with a task to write non-standard words, like numbers, dates, times, abbreviations, acronyms and the most common symbols, in their full expanded form are presented. The whole taxonomy for classification of non-standard words in Croatian language togethe...
متن کاملMyanmar Number Normalization for Text-to-Speech
--Text Normalization is an essential module for Text-to-Speech (TTS) system as TTS systems need to work on real text. This paper describes Myanmar number normalization designed for Myanmar Text-to-Speech system. Semiotic classes forMyanmar language are identified by the study of Myanmar text corpus and Weighted Finite State Transducers (WFST) based Myanmar number normalization is implemented. N...
متن کاملSemiotic Analysis of Written Signs in the Road Sign Systems of Tehran City
Introduction: as a component of the urban landscape, road sign systems are among the most critical elements of urban environments. Generally speaking, the written signs dominate the design of these systems. These signs can also foster aesthetic and visual pleasure compellingly and innovatively. Furthermore, they perpetuate a specific image in the minds of their observers. This research seeks to...
متن کاملEvaluation the theories of semiotics approach in the Reading of Architecture and Urbanism
This essay is considered an attempt to present how semiotic studies can be used as a perceptional aspect in reading architecture and urbanism. Appearance of each art is similar to creation of a “text” which transfers a set of customs, values and thought together with itself. Production of each “text” is based on its context, culture and intellectual bed of its origin society. Each text is an ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017